SEIO 2025
A. González Romero, C. Lancho Martín, Á. Novillo, V. Aceña, J. García-Ochoa, I. Martín de Diego
Data Science Laboratory, Universidad Rey Juan Carlos
Foundations
Applications
Famous examples:
Student and domain expert: Alonso González Romero
This work applies data science techniques to analyze performance in competitive swimming, using results from the 2024 World Swimming Championships (Budapest) in a 25-meter pool.
Data source: Omega Timing
Omega is the official technical sponsor of international swimming events and provides high-precision race data.
⟶ Clustering
Datasets can be categorized based on their temporal availability:
Traditional clustering algorithms assume full access to the entire dataset from the start.
Streaming data introduces an implicit temporal dependency: observations at time \(t\) are often related to those at time \(t-1\), creating an evolving structure. Evolutionary clustering algorithms process data sequentially and incorporate past information to update cluster structures in real time.
Main goal: to detect race breaks
Based on: EvolveCluster: an evolutionary clustering algorithm for streaming data (Nordahl et al. 2022)
Adapted to swimming context: observations = swimmer gaps over time
Algorithm EvolveCluster (Nordahl et al. 2022)
\(D\) is a continuous stream of data, segmented into time-based chunks \(D_0, D_1, \ldots, D_t, t\rightarrow \infty\).
\(D_0\) is partitioned in \(k\) clusters (via \(k\)-means): \(C_{0}=\{C_{00},\dots, C_{0k}\}\).
For each segment \(D_t, t\neq0\), \(k\)-means is initialized using centroids from \(C_{t-1}\)
Centroids of \(C_{t-1}\) are removed and empty clusters are deleted.
New centroides are calculated and the partition \(C_t\) is refined:
Any cluster should be split into two? Apply \(2\)-means for each cluster with the two furthest points as initial centroids: \(C'_t\)
The two options of partitions are evaluated by a validation measure (e.g. Silhouette index SI): \(\text{If } \text{SI}(C'_t) > \text{SI}(C_t) + \tau \Rightarrow C_t \leftarrow C'_t\)
Figure extracted from (Nordahl et al. 2022)
Goal: Detect race breaks
Input: Time gap from race leader
Let \(x_i\) and \(x_{i+1}\) be consecutive swimmers in cluster \(C_l\). If \(d(x_i, x_{i+1}) > \tau\), split the cluster at that point.
\(\tau\) can also be selected following a validation measure (SI, Dunn index, etc.)
🖥️ Live demo 🤞🏻
⟶ Lack of dynamism
Despite the limited data, our analysis provides:
carmen.lancho@urjc.es
Questions?
SEIO 2025, 10 de junio 2025